Derive source-map tuples from Babel's `decodedMap`, reduce cold build CPU ~2.5% by robhogan · Pull Request #1741 · react/metro

robhogan · 2026-06-23T14:54:43Z

Summary:
Metro's transform worker currently returns source maps from Babel's tranform result via result.rawMappings.map(toSegmentTuple). This used to be (as the name suggests) Babel's own source map representation, and was therefore free to access.

However, since babel/babel#14497 (babel/generator since v7.17.10), rawMappings is now a getter providing the old structure for backwards compatibility. Accessing result.rawMappings forces
babel/generator to run a second decode (allMappings) that allocates a flat
array of ~4-5 objects per segment.

The better alternative now is to use result.decodedMap, which is eagerly computed and free to access. To accommodate the different structure, we introduce
tuplesFromBabelDecodedMap (decoded source lines are 0-based -> +1, name indices
resolved against decodedMap.names).

Transformer output is byte-identical to result.rawMappings.map(toSegmentTuple), and is simply more efficient.

Microbenchmark

Real babel/generator 7.29.1 over 133 modules / ~30.6K segments,
--expose-gc, taking median of 11 repeats to discount GC outliers, etc.

Path	CPU (ms/pass)	Transient heap	Notes
New: `generate()` + `decodedMap`	19.2	13.9 MB	eager, already computed — free
Old: `generate()` + `rawMappings`	28.8	19.5 MB	triggers `allMappings` decode
Saving	−9.6 ms (−33%)	−5.6 MB (−29%)	per pass over 30.6K segments

E2E benchmark - large bundle, cold build

(AI driven benchmarks and analysis, real numbers)

Interleaved, paired A/B: each of 12 rounds runs one cold build per cell —
{baseline, this diff} x {child-process workers, worker threads}.
Fresh Metro per build, transform cache wiped (cold), maxWorkers=16
"Transform CPU" = total user+sys CPU across the
whole worker process tree
"tree RSS" = whole-tree resident set (captures
workers in both modes)
"graph heap" = main-isolate heapUsed post-build (the
retained module graph).
base/this-diff columns are medians; Δ is the paired
mean with a 95% CI (Student-t, 11 df)
"n.s." (not significant) = CI includes 0.

Child-process workers (Metro default; 12 paired rounds):

metric	baseline	this diff	Δ (95% CI)
transform CPU (s)	625	612	-16.6 (-2.6%) [-24.7, -8.5]
build wall (s)	65.9	65.6	-0.5 (-0.7%) n.s.
transient tree RSS (GB)	15.8	16.0	+0.06, n.s.
post-build tree RSS (GB)	15.1	15.1	+0.08, n.s.
graph heap, main isolate (GB)	1.59	1.59	~0, n.s.

Worker threads (unstable_workerThreads; 12 paired rounds):

metric	baseline	this diff	Δ (95% CI)
transform CPU (s)	664	653	-18.6 (-2.8%) [-37.5, +0.3]
build wall (s)	59.8	59.5	-1.2 (-1.9%) n.s.
transient RSS (GB)	13.2	12.7	-0.46 (-3.5%) [-0.81, -0.11]
post-build RSS (GB)	12.3	11.9	-0.45 (-3.7%) [-0.80, -0.10]
graph heap, main isolate (GB)	1.60	1.60	~0, n.s.

Takeaways:

Transform CPU drops ~2.6-2.8%, equally in both worker modes — the point
estimates (-16.6 s child-process, -18.6 s threads) agree to within 2 s and
their CIs overlap almost entirely, so there is no real asymmetry. This is
exactly what the mechanism predicts: the optimization runs inside the worker
(consume decodedMap instead of forcing the rawMappings/allMappings
decode), so the saving is identical whether the worker is a child process or a
thread. (An earlier small-n pass suggested a child-process-only win; that was
sampling noise — threads-mode CPU is just noisier, SD 30 s vs 13 s, which only
widens its CI without moving the point estimate.)
Build wall time is ~1-2% lower in both modes but within noise — the CPU saving
is spread across 16 workers, so it moves the critical path little.
Main-isolate post-build heap (the retained graph of stored tuples) is
unchanged in every config — no memory regression, byte-identical output.

Changelog:

 - **[Performance]**: Use Babel's `decodedMap` for ~2.5% faster transforms

Reviewed By: huntie, GijsWeterings

Differential Revision: D108506323

Summary: Scripts and findings for profiling Metro's memory and CPU during bundling, and an end-to-end benchmark of the compact VLQ source-map work stacked on top. **Methodology:** - Start Metro with `NODE_ARGS="--expose-gc --inspect=9230" DEV=1 js1 run --prefetch=false` - WildeBundle URL: `GET http://localhost:8081/xplat/js/RKJSModules/EntryPoints/WildeBundle.bundle?platform=ios&dev=true&app=com.facebook.Wilde` - RSS profiling via /proc, heap snapshots via Chrome DevTools Protocol - Graph freed via DELETE to the bundle URL (same as fill-http-cache) **Scripts added:** - `fb-metro-cli/memory-investigation/heap-profile.js` — Automated CDP-based profiler: captures 3 heap snapshots (baseline, post-build, post-delete) and compares them - `fb-metro-cli/memory-investigation/heap-compare.js` — Standalone snapshot comparator with streaming parser for multi-GB .heapsnapshot files - `fb-metro-cli/memory-investigation/heap-injector.js` — Optional in-process module exposing /memory, /gc, /snapshot HTTP endpoints - `metro/scripts/profile-memory.sh` — Quick RSS-only profiling via /proc - `fb-metro-cli/memory-investigation/compact-bench-measure.js` — One measurement cycle: builds WildeBundle, then requests WildeBundle.map, recording memory (RSS/heap) + build CPU + .map serialize CPU via CDP - `fb-metro-cli/memory-investigation/run-compact-bench.sh` — Orchestrator: fresh Metro per repeat across three configs (base / compact_flat / compact_indexed), cold or warm cache - `fb-metro-cli/memory-investigation/compact-bench-stats.js` — Welch t-test analysis between any two configs - `fb-metro-cli/memory-investigation/README.md`, `compact-sourcemaps-benchmark-results.md` — Full writeup of methodology and results **Baseline results (WildeBundle, June 2025):** - Startup: 819 MB RSS / 426 MB heap used - Post-build: 2,338 MB RSS / 1,549 MB heap used (+1,122 MB heap) - Post-delete: 507 MB heap used (DELETE frees 93% of build growth) - Arrays dominate: 10M Array objects + backing stores = 858 MB (77% of growth) - Source maps stored as decoded number-tuple arrays are the primary consumer: ~678 MB, 60% of build growth (9,866,476 tuples across 16,562 modules) **Compact source maps — end-to-end benchmark (n=3, WildeBundle):** Three configs: `base` (decoded tuples), `compact_flat` (VLQ storage, flat .map), `compact_indexed` (VLQ storage, indexed passthrough .map). - Memory (both compact configs): heap −51% cold / −53% warm; RSS −48% (1654→810 MB heap cold; all Welch p < 1e-5). - Build CPU: unchanged cold; ~20% faster warm with compact storage. - Serialize CPU (`.map` request): `compact_flat` +18% vs base (decode + re-encode), `compact_indexed` −49% vs base (passthrough). Flat .map is byte-identical to base; indexed .map is +3.4% larger. Bundle output byte-identical across all configs. Full tables in `compact-sourcemaps-benchmark-results.md`. Differential Revision: D107879392

… CPU ~2.5% Summary: Metro's transform worker currently returns source maps from Babel's tranform result via `result.rawMappings.map(toSegmentTuple)`. This *used* to be (as the name suggests) Babel's own source map representation, and was therefore free to access. However, since babel/babel#14497 (`babel/generator` since `v7.17.10`), `rawMappings` is now a getter providing the old structure for backwards compatibility. Accessing `result.rawMappings` forces `babel/generator` to run a second decode (`allMappings`) that allocates a flat array of ~4-5 objects per segment. The better alternative now is to use `result.decodedMap`, which is eagerly computed and free to access. To accommodate the different structure, we introduce `tuplesFromBabelDecodedMap` (decoded source lines are 0-based -> +1, name indices resolved against `decodedMap.names`). Transformer output is byte-identical to `result.rawMappings.map(toSegmentTuple)`, and is simply more efficient. ## Microbenchmark - Real `babel/generator` 7.29.1 over 133 modules / ~30.6K segments, `--expose-gc`, taking median of 11 repeats to discount GC outliers, etc. | Path | CPU (ms/pass) | Transient heap | Notes | |---|---|---|---| | New: `generate()` + `decodedMap` | 19.2 | 13.9 MB | eager, already computed — free | | Old: `generate()` + `rawMappings` | 28.8 | 19.5 MB | triggers `allMappings` decode | | **Saving** | **−9.6 ms (−33%)** | **−5.6 MB (−29%)** | per pass over 30.6K segments | ## E2E benchmark - large bundle, cold build (*AI driven benchmarks and analysis, real numbers*) - Interleaved, paired A/B: each of 12 rounds runs one cold build per cell — {baseline, this diff} x {child-process workers, worker threads}. - Fresh Metro per build, transform cache wiped (cold), `maxWorkers=16` - "Transform CPU" = total user+sys CPU across the whole worker process tree - "tree RSS" = whole-tree resident set (captures workers in both modes) - "graph heap" = main-isolate heapUsed post-build (the retained module graph). - base/this-diff columns are medians; Δ is the paired mean with a 95% CI (Student-t, 11 df) - "n.s." (not significant) = CI includes 0. Child-process workers (Metro default; 12 paired rounds): | metric | baseline | this diff | Δ (95% CI) | |---|---|---|---| | transform CPU (s) | 625 | 612 | **-16.6 (-2.6%) [-24.7, -8.5]** | | build wall (s) | 65.9 | 65.6 | -0.5 (-0.7%) n.s. | | transient tree RSS (GB) | 15.8 | 16.0 | +0.06, n.s. | | post-build tree RSS (GB) | 15.1 | 15.1 | +0.08, n.s. | | graph heap, main isolate (GB) | 1.59 | 1.59 | ~0, n.s. | Worker threads (`unstable_workerThreads`; 12 paired rounds): | metric | baseline | this diff | Δ (95% CI) | |---|---|---|---| | transform CPU (s) | 664 | 653 | -18.6 (-2.8%) [-37.5, +0.3] | | build wall (s) | 59.8 | 59.5 | -1.2 (-1.9%) n.s. | | transient RSS (GB) | 13.2 | 12.7 | -0.46 (-3.5%) [-0.81, -0.11] | | post-build RSS (GB) | 12.3 | 11.9 | -0.45 (-3.7%) [-0.80, -0.10] | | graph heap, main isolate (GB) | 1.60 | 1.60 | ~0, n.s. | Takeaways: - **Transform CPU drops ~2.6-2.8%, equally in both worker modes** — the point estimates (-16.6 s child-process, -18.6 s threads) agree to within 2 s and their CIs overlap almost entirely, so there is no real asymmetry. This is exactly what the mechanism predicts: the optimization runs *inside* the worker (consume `decodedMap` instead of forcing the `rawMappings`/`allMappings` decode), so the saving is identical whether the worker is a child process or a thread. (An earlier small-n pass suggested a child-process-only win; that was sampling noise — threads-mode CPU is just noisier, SD 30 s vs 13 s, which only widens its CI without moving the point estimate.) - Build wall time is ~1-2% lower in both modes but within noise — the CPU saving is spread across 16 workers, so it moves the critical path little. - Main-isolate post-build heap (the retained graph of stored tuples) is unchanged in every config — no memory regression, byte-identical output. Changelog: ``` - **[Performance]**: Use Babel's `decodedMap` for ~2.5% faster transforms Reviewed By: huntie, GijsWeterings Differential Revision: D108506323

meta-codesync · 2026-06-23T14:55:11Z

@robhogan has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108506323.

Reinaldotec

Testing

robhogan added 2 commits June 23, 2026 07:54

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 23, 2026

meta-codesync Bot added the meta-exported label Jun 23, 2026

Reinaldotec reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derive source-map tuples from Babel's `decodedMap`, reduce cold build CPU ~2.5%#1741

Derive source-map tuples from Babel's `decodedMap`, reduce cold build CPU ~2.5%#1741
robhogan wants to merge 2 commits into
mainfrom
export-D108506323

robhogan commented Jun 23, 2026

Uh oh!

meta-codesync Bot commented Jun 23, 2026

Uh oh!

Reinaldotec left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robhogan commented Jun 23, 2026

Microbenchmark

E2E benchmark - large bundle, cold build

Uh oh!

meta-codesync Bot commented Jun 23, 2026

Uh oh!

Reinaldotec left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants